智能论文笔记

Biomedical image analysis competitions: The state of current participation practice

Matthias Eisenmann , Annika Reinke , Vivienn Weru , Minu Dietlinde Tizabi , Fabian Isensee , Tim J. Adler , Patrick Godau , Veronika Cheplygina , Michal Kozubek , Sharib Ali

分类：计算机视觉 | 机器学习

2022-12-16

The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.

translated by 谷歌翻译

CD-TTA: Compound Domain Test-time Adaptation for Semantic Segmentation

Junha Song , Kwanyong Park , Inkyu Shin , Sanghyun Woo , In So Kweon

分类：计算机视觉

2022-12-16

Test-time adaptation (TTA) has attracted significant attention due to its practical properties which enable the adaptation of a pre-trained model to a new domain with only target dataset during the inference stage. Prior works on TTA assume that the target dataset comes from the same distribution and thus constitutes a single homogeneous domain. In practice, however, the target domain can contain multiple homogeneous domains which are sufficiently distinctive from each other and those multiple domains might occur cyclically. Our preliminary investigation shows that domain-specific TTA outperforms vanilla TTA treating compound domain (CD) as a single one. However, domain labels are not available for CD, which makes domain-specific TTA not practicable. To this end, we propose an online clustering algorithm for finding pseudo-domain labels to obtain similar benefits as domain-specific configuration and accumulating knowledge of cyclic domains effectively. Moreover, we observe that there is a significant discrepancy in terms of prediction quality among samples, especially in the CD context. This further motivates us to boost its performance with gradient denoising by considering the image-wise similarity with the source distribution. Overall, the key contribution of our work lies in proposing a highly significant new task compound domain test-time adaptation (CD-TTA) on semantic segmentation as well as providing a strong baseline to facilitate future works to benchmark.

translated by 谷歌翻译

Benchmarking Self-Supervised Learning on Diverse Pathology Datasets

Mingu Kang , Heon Song , Seonwook Park , Donggeun Yoo , Sérgio Pereira

分类：计算机视觉 | 机器学习

2022-12-09

Computational pathology can lead to saving human lives, but models are annotation hungry and pathology images are notoriously expensive to annotate. Self-supervised learning has shown to be an effective method for utilizing unlabeled data, and its application to pathology could greatly benefit its downstream tasks. Yet, there are no principled studies that compare SSL methods and discuss how to adapt them for pathology. To address this need, we execute the largest-scale study of SSL pre-training on pathology image data, to date. Our study is conducted using 4 representative SSL methods on diverse downstream tasks. We establish that large-scale domain-aligned pre-training in pathology consistently out-performs ImageNet pre-training in standard SSL settings such as linear and fine-tuning evaluations, as well as in low-label regimes. Moreover, we propose a set of domain-specific techniques that we experimentally show leads to a performance boost. Lastly, for the first time, we apply SSL to the challenging task of nuclei instance segmentation and show large and consistent performance improvements under diverse settings.

translated by 谷歌翻译

Semantics-Guided Object Removal for Facial Images: with Broad Applicability and Robust Style Preservation

Jookyung Song , Yeonjin Chang , Seonguk Park , Nojun Kwak

分类：计算机视觉 | 机器学习

2022-09-29

面部图像中的对象删除和图像介绍是一项任务，其中遮挡面部图像的对象被专门针对，删除和替换为正确重建的面部图像。利用U-NET和调制发电机的两种不同的方法已被广泛认可了该任务的独特优势，但尽管每种方法的先天缺点。 u-net是一种有条件剂的常规方法，保留了未掩盖区域的精细细节，但是重建图像的样式与原始图像的其余部分不一致，并且只有在遮挡对象的大小足够小时才可以坚固。相比之下，调制生成方法可以处理图像中较大的阻塞区域，并提供{a}更一致的样式，但通常会错过大多数详细功能。这两种模型之间的这种权衡需要制定模型的发明，该模型可以应用于任何尺寸的面具，同时保持一致的样式并保留面部特征的细节细节。在这里，我们提出了语义引导的介绍网络（SGIN）本身是对调制发电机的修改，旨在利用其先进的生成能力并保留原始图像的高保真详细信息。通过使用语义图的指导，我们的模型能够操纵面部特征，这些特征将方向赋予了一对多问题，以进一步实用。

translated by 谷歌翻译

LAVOLUTION: Measurement of Non-target Structural Displacement Calibrated by Structured Light

Jongbin Won , Minhyuk Song , Gunhee Kim , Jong-Woong Park , Haemin Jeon

分类：计算机视觉

2022-09-15

位移是评估结构条件的重要测量，但是与传感器安装和测量精度相关的困难通常会阻碍其现场测量。为了克服常规位移测量的缺点，由于其遥感功能和准确性，已经实施了基于计算机视觉（CV）的方法。本文提出了一种非目标结构位移测量的策略，该策略利用简历来避免在结构上安装目标的需求，同时使用结构性光对位移进行校准。所提出的称为Lavolution的系统使用四个等距的结构光的光束计算了相机在结构方面的相对位置，并获得了一个比例因子，以将像素运动转换为结构位移。设计了四个结构光束的夹具，并提出了相应的对齐过程。提出了一种使用设计的夹具来计算尺度因子的方法，并通过数值模拟和实验室规模实验验证了并验证。为了确认所提出的位移测量过程的可行性，进行了摇桌和全尺寸桥梁的实验，并将提出方法的精度与参考激光多普勒振动仪进行比较。

translated by 谷歌翻译

ELF22: A Context-based Counter Trolling Dataset to Combat Internet Trolls

Huije Lee , Young Ju NA , Hoyun Song , Jisu Shin , Jong C. Park

分类：自然语言处理

2022-07-30

在线巨魔增加了社会成本，并对个人造成心理损害。随着自动化帐户利用机器人进行拖钓的扩散，目标个人用户很难在定量和定性上处理这种情况。为了解决这个问题，我们专注于自动化对抗巨魔的方法，因为对战斗巨魔的反应鼓励社区用户在不损害言论自由的情况下保持持续的讨论。为此，我们为自动反响应生成提出了一个新颖的数据集。特别是，我们构建了一个配对数据集，其中包括巨魔评论和使用标记的响应策略的反响应，该策略使我们的数据集中的模型可以通过根据指定策略改变反响应来生成响应。我们执行了三个任务来评估数据集的有效性，并通过自动和人类评估评估结果。在人类评估中，我们证明了数据集中微调的模型显示出策略控制的句子生成的性能有了显着改善。

translated by 谷歌翻译

Automated Audio Captioning and Language-Based Audio Retrieval

Clive Gomes , Hyejin Park , Patrick Kollman , Yi Song

分类：自然语言处理

2022-07-08

该项目涉及参加DCASE 2022竞赛（任务6），该竞赛具有两个子任务：（1）自动化音频字幕和（2）基于语言的音频检索。第一个子任务涉及对音频样本的文本描述的生成，而第二个目标是在匹配给定描述的固定数据集中找到音频样本。对于两个子任务，都使用了Clotho数据集。在BLEU1，BLEU2，BLEU3，ROGEL，Meteor，Cider，Spice和Spider评分上评估了这些模型，用于音频字幕，R1，R5，R10和MARP10分数用于音频检索。我们进行了一些实验，以修改这些任务的基线模型。我们用于自动音频字幕的最终体系结构接近基线性能，而我们的基于语言的音频检索模型已超过其对应方。

translated by 谷歌翻译

e-CLIP: Large-Scale Vision-Language Representation Learning in E-commerce

Wonyoung Shin , Jonghun Park , Taekang Woo , Yongwoo Cho , Kwangjin Oh , Hwanjun Song

分类：机器学习 | 计算机视觉

2022-07-01

了解产品内容的视觉和语言表示对于电子商务中的搜索和推荐应用程序至关重要。作为在线购物平台的骨干，受到代表学习研究的最新成功的启发，我们提出了一个对比度学习框架，该框架使用未标记的原始产品文本和图像来对齐语言和视觉模型。我们介绍了我们用来培训大规模代表性学习模型的技术，并共享解决特定领域挑战的解决方案。我们使用预先训练的模型作为多种下游任务的骨干进行研究，包括类别分类，属性提取，产品匹配，产品聚类和成人产品识别。实验结果表明，我们所提出的方法在每个下游任务中均优于单个模态和多种方式的基线。

translated by 谷歌翻译

TAM: Topology-Aware Margin Loss for Class-Imbalanced Node Classification

Jaeyun Song , Joonhyung Park , Eunho Yang

分类：机器学习 | 人工智能

2022-06-26

由于相邻的节点之间的相互作用，在类不平衡的图形数据下学习无偏的节点表示具有挑战性。现有研究的共同点是，它们根据其总数（忽略图中的节点连接）来补偿次要类节点“作为组”，这不可避免地增加了主要节点的假阳性病例。我们假设这些假阳性病例的增加受到每个节点周围的标签分布的高度影响，并通过实验确认。此外，为了解决这个问题，我们提出了拓扑意识的利润率（TAM），以反映学习目标的本地拓扑。我们的方法将每个节点的连通性模式与类平均反向零件进行比较，并根据此相应地适应边缘。我们的方法始终在具有代表性GNN体系结构的各种节点分类基准数据集上表现出优于基线的优势。

translated by 谷歌翻译

TRAVEL: Traversable Ground and Above-Ground Object Segmentation Using Graph Representation of 3D LiDAR Scans

Minho Oh , Euigon Jung , Hyungtae Lim , Wonho Song , Sumin Hu , Eungchang Mason Lee , Junghee Park , Jaekyung Kim , Jangwoo Lee , Hyun Myung

分类：机器人

2022-06-07

从3D点云中对可遍历区域和感兴趣的对象的感知是自主导航中的关键任务之一。一辆地面车辆需要寻找可以通过车轮探索的可遍历的地形。然后，为了做出安全的导航决定，必须跟踪位于这些地形上的物体的分割。但是，过度分割和分割不足可能会对此类导航决策产生负面影响。为此，我们提出了旅行，该行程使用3D点云的图表表示可遍历的地面检测和对象聚类。为了将可穿越的接地段分割，将点云编码为图形结构，即三个格里德字段，该场将每个三个格里德视为节点。然后，通过检查连接节点的边缘的局部凸度和凹度来搜索和重新定义可遍历的区域。另一方面，我们的地上对象分割通过表示球形预测空间中的一组水平相邻的3D点作为节点和节点之间的垂直/水平关系，以使用图形结构。充分利用节点边缘结构，上面的分割可确保实时操作并减轻过度分割。通过使用模拟，城市场景和我们自己的数据集的实验，我们已经证明，根据常规指标，我们提出的遍历地面分割算法优于其他最新方法，并且我们新提出的评估指标对于评估是有意义的地上细分。我们将在https://github.com/url-kaist/travel上向公开提供代码和自己的数据集。

translated by 谷歌翻译